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Accommodating the uncertain latency of load instructionsis one of the most vexing problems in in-order 
microarchltecturedesign and compiler development. Compilers cangenerate schedules with a high degree of in 
levelparallelism but cannot effectively accommodate unanticipatedlatencles; incorporating traditional out-of-o 
executioninto the microarchitecture hides some of this latencybut redundantly performs work done by the com 
additional pipeline stages. Although effectiv ... 

2 Formai. yerificMion.] 

Christoph Kern, Mark R. Greenstreet 

April 1999 ACM Transactions on Design Automation of Electronic Systems (TODAES), volume 4 issue 2 

Full text available: ;)cjf(41.1 .53 KB) Additional Information: f -;!! cit ation, sbMcad, references, citings, Index terms 

In recent years, formal methods have emerged as an alternative approach to ensuring the quality and correct 
designs, overcoming some of the limitations of traditional validation techniques such as simulation and testing 
main aspects to the application of formal methods in a design process: the formal framework used to specify 
properties of a design and the verification techniques and tools used to reason about the relationship between 

Keywords: case studies, formal methods, formal verification, hardware verification, language containment, m 
survey, theorem proving 
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Comoi jer transformati ons for hi gh-perform a nce co mputing 
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December 1994 ACM Computing Surveys (CSUR), volume 26 Issue 4 
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In the last three decades a large number of compiler transformations for optimizing programs have been impi 
optimizations for uniprocessors reduce the number of instructions executed by the program using transformat 
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the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance sup 
and parallel processors maximize parallelism and memory locality with transformations that rely on tracking t 

o ... ' 

Keywords: compilation, dependence analysis, locality, multiprocessors, optimization, parallelism, superscala 
vectorizatlon 
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March 2003 ACM Computing Surveys (CSUR), Volume 35 issue i 

Full text available: 'g^ p^:Sf{920.16 KB) Additional Information: full citation , abstract , references, index terms 

Hardware multithreading is becoming a generally applied technique in the next generation of microprocessors 
multithreaded processors are announced by industry or already into production in the areas of high-performa 
microprocessors, media, and network processors. A multithreaded processor is able to pursue two or more thr 
parallel within the processor pipeline. The contexts of two or more threads of control are often stored in separ 
register sets. Unused i ... 

Keywords: Blocked multithreading, interleaved multithreading, simultaneous multithreading 
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on Computer architecture, volume 24 issue 2 
Full text available: "^pdfM.,4S_!v1B). Additional Information: f;jy,cJt3tion, sbstract, references, citings, jiidex.terim 

Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP 
particular architecture. Unfortunately, these studies indicate performance improvement using the number of c 
required to execute a program, but do not quantitatively estimate the penalty imposed on the cycle time from 
Since the performance of a microprocessor must be measured by its execution time, a cycle time evaluation i 
as a cy ... 
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Jeffrey Opiinger, Mon\ca S. Lam 

October 2002 Proceedings of the 10th international conference on Architectural support for programm 

and operating systems, volume 37 , 36 , 30 issue 10 , 5 , 5 
Full text available: ■^pdf(.14Z.MB) Additional Information: feiLcitatign, abstract, rei&reoce.s., citings 

This paper advocates the use of a mohitor-and-recover programming paradigm to enhance the reliability of so 
proposes an architectural design that allows software and hardware to cooperate in making this paradigm mo 
easier to program. We propose that programmers write monitoring functions assuming simple sequential exec 
Our architecture speeds up the computation by executing the monitoring functions speculatively in parallel wi 
computation. For ... 
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May 2000 ACM Transactions on Computer Systems (TOCS), volume is issue 2 

Full text available: '^.pd_f(2Jjg,51. KB) Additional Information: full citation, absiract, rMei.ence.s, citings, indexierms 

The large address space needs of many current applications have pushed processor designs toward 64-bit wo 
Although full 64-bit addresses and operations are indeed sometimes needed, arithmetic operations on much s 
are still more common. In fact, another instruction set trend has been the introduction of instructions geared 
operations on 16-bit quantities. For examples, most major processors now include instruction set support for 
operation ... 

^ System-ievel power optimization: techniques and tools 
Luca Benini, Giovanni de Micheli 

April 2000 ACM Transactions on Design Automation of Electronic Systems (TODAES), volume 5 issue 2 
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This tutorial surveys design methods for energy-efficient system-level design. We consider electronic sytems 
hardware platform and software layers. We consider the three major constituents of hardware that consume e 
computation, communication, and storage units, and we review methods of reducing their energy consumptio 
models for analyzing the energy cost of software, and methods for energy-efficient software design and comp 
survery ... 
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Michael Upton, Thomas Huff, Trevor Mudge, Richard Brown 

November 1994 Proceedings of the sixth international conference on Architectural support for program 
languages and operating systems, volume 29 , 28 issue ii , 5 

Full text available: ^orff^ 1. 10 MB) Additional Information: fLiil citation , abstf act . references , citings. inde:x terms 

This paper discusses the design of a high clock rate (300|V|Hz) processor. The architecture is described, and th 
design are explained. The performance of three processor models is evaluated using trace-driven simulation, 
used to estimate the resources required to build processors with varying sizes of on-chip memories, In both s 
issue models. Recommendations are then made to increase the effectiveness of each of the models. 

Keywords: decoupled architecture, floating point latencies, nonblocking cache, pipelining, prefetching, resou 
superscalar 
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April 1998 ACM SIGARCH Computer Architecture News , Proceedings of the 25th annual internation 
on Computer architecture, volume 26 issue 3 

Full text available. ^ rj^igreo MB) W Publisher Site Additional Information: fuii citatiof), abstf act, references , ciljngs, index terms 

Explicitly Parallel Instruction Computing (EPIC) architectures require the compiler to express program instruct 
parallelism directly to the hardware. EPIC techniques which enable the compiler to represent control speculati 
dependence speculation, and predication have individually been shown to be very effective. However, these te 
not been studied in combination with each other. This paper presents the IMPACT EPIC Architecture to addres 
involved in design ... 
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June 1987 Proceedings of the 14th annual international symposium on Computer architecture 

Full text available: 'gpdfM.H fvlB) Additional Information: Ixijj citatio n, ab;>tfact . references, citings. Indox terms 

In this paper, the WISQ architecture is described. This architecture Is designed to achieve high performance b 
compiler technology and using a highly segmented pipeline. By having a highly segmented pipeline, a very-hi 
can be used. Since a highly segmented pipeline will require relatively long pipelines, a way must be provided 
effects of pipeline bubbles that are formed due to data and control dependencies. It Is also important to provi 

^ ^ i^§Lde^ect|on..oLcommun 
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November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborativ 

Full text available: 'f^X)div4.21jyiB) Additional Infornriation: M clUition . abstract, r<^H^fenc:e$ . ;r;dex term? 

Understanding distributed applications is a tedious and difficult task. Visualizations based on process-time dia 
used to obtain a better understanding of the execution of the application. The visualization tool we use is Poe 
developed at the University of Waterloo. However, these diagrams are often very complex and do not provide 
the desired overview of the application. In our experience, such tools display repeated occurrences of non-triv 

September 2001 Journal on Educational Resources in Computing (JERIC) 

Full text available: ■g pdtV513.53 KB) |g]Jiii2jj{2ja Additional Information: M citation, references, cltlnos, Index terms 
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specificatio n and verifica ti on of pipeiining in the ARIv12 RISC microprocessor 
James K. HugginS; David Van Campenhout 

October 1998 ACM Transactions on Design Automation of Electronic Systems (TODAES), volume 3 issue 4 
Full text available: "^pdffJ.^l.M.KB) Additional Information: fviLLQitatipn, abstract, references, citings, .Lndex.terms 

Gurevich Abstract State Machines (ASMs) provide a sound mathematical basis for the specification and verific 
An application of the ASM methodology to the verification of a pipelined microprocessor (an ARM2 implementa 
described. Both the sequential execution model and final pipelined model are formalized using ASMs. A series 
models are introduced that gradually expose the complications of pipelining. The first intermediate model is p 
t ... 

Keywords: ARM processor, abstract state machines, design verification, formal verification, pipelined process 
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May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd annual internatio 
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Full text available: pdf(1.50 ^^B) Additional Information: f;;i! citatbn . abstract, ref erence s, citings, iMexierms 

Speculative execution is execution of instructions before it is known whether these instructions should be exe 
based speculative execution has the potential to achieve both a high instruction per cycle rate and high clock 
compiler-based approaches, however, have greatly limited instruction scheduling due to a limited ability to ha 
of speculative execution. Significant performance improvement is, thus, difficult in non-numerical applications 
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June 2001 Proceedings of the 38th conference on Design automation 

Full text available: i:>dffi46.93 KB) Additional Information: fuW cit^^tlon , abstract , referen ces , index terms 

This paper describes the first application of the Genevieve test generation methodology. The Genevieve appro 
for-mal techniques derived from "model-checking" to generate test suites for specific behaviours of the design 
'Nnterest-ing" behaviour is claimed to be unreachable. If a path from an ini-tial state to the state of interest d 
counter-example is generated. The sequence of states specifies a test for the desired behaviour. ... 
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Michael D. Smith, Mark Horowitz, Monica S. Lam 

September 1992 ACi^i SIGPLAN Notices , Proceedings of the fifth international conference on Architectu 
programming languages and operating systems, volume 27 issue 9 
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The foremost goal of superscalar processor design is to increase performance through the exploitation of instr 
parallelism (ILP). Previous studies have shown that speculative execution is required for high instruction per c 
in non-numerical applications. The general trend has been toward supporting speculative execution in compile 
dynamically-scheduled processors. Performance, though, is more than just a high IPC rate; it also depends up 
count ... 
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July 1998 Proceedings of the 12th international conference on Supercomputing 
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Juan L. Aragon, Jose Gonzalez, Antonio Gonzalez, James E. Smith 

June 2002 Proceedings of the 16th international conference on Supercomputing 
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The reasons for performance losses due to conditional branch mispredictions are first studied. Branch mispred 
are broken into three categories: pipeline-fill penalty, window-fill penalty, and serialization penalty. The first a 
produce most of the performance loss, but the second is also significant. Previously proposed dual (or muiti) p 
methods attempt to reduce all three penalties, but these methods are also quite complex. Most of the comple 

Keywords: branch misprediction penalty, confidence estimation, dual path processing, pre-scheduling 
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